Applying Context-Based Prediction in Adversarial Watkins’ Q(λ)-Learning

نویسندگان

Arisoa S. Randrianasolo

Larry D. Pyeatt

چکیده

This paper exhibits the transformation of Watkins’ Q(λ) learning algorithm into an adversarial Qlearning algorithm. A method called context-based prediction, borrowed from multimedia data coding, is used as opponent modeling and is incorporated in the transformed, CBQ(λ) algorithm. We tested CBQ(λ) by playing against three opponents. The first opponent had no prior knowledge and discovered policies as it played. The second opponent carried prior knowledge and used a fixed policy. The third opponent carried prior knowledge and continued to improve its prior policy as play progressed. CBQ(λ) performed well by playing tic-tac-toe against the three opponents listed previously, when given 10 seconds of simulation time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Second Difference Traces

Q-learning is a reliable but inefficient off-policy temporal-difference method, backing up reward only one step at a time. Replacing traces, using a recency heuristic, are more efficient but less reliable. In this work, we introduce model-free, off-policy temporal difference methods that make better use of experience than Watkins’ Q(λ). We introduce both Optimistic Q(λ) and the temporal second ...

متن کامل

Opposition-Based Q(λ) with Non-Markovian Update

The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins’ Q(λ) reflected a remarkable increase in perf...

متن کامل

Safe and Efficient Off-Policy Reinforcement Learning

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of “off-policyness”; and (3) it is efficient as it makes the b...

متن کامل

Multi-Legged Robot Control Using GA-Based Q-Learning Method With Neighboring Crossover

Recently reinforcement learning has received much attention as a learning method (Sutton, 1988; Watkins & Dayan, 1992). It does not need a priori knowledge and has higher capability of reactive and adaptive behaviors. However, there are some significant problems in applying it to real problems. Some of them are deep cost of learning and large size of actionstate space. The Q-learning (Watkins &...

متن کامل

Disguise Adversarial Networks for Click-through Rate Prediction

We introduced an adversarial learning framework for improving CTR prediction in Ads recommendation. Our approach was motivated by observing the extremely low click-through rate and imbalanced label distribution in the historical Ads impressions. We hence proposed a Disguise-AdversarialNetworks (DAN) to improve the accuracy of supervised learning with limited positive-class information. In the c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Applying Context-Based Prediction in Adversarial Watkins’ Q(λ)-Learning

نویسندگان

چکیده

منابع مشابه

Temporal Second Difference Traces

Opposition-Based Q(λ) with Non-Markovian Update

Safe and Efficient Off-Policy Reinforcement Learning

Multi-Legged Robot Control Using GA-Based Q-Learning Method With Neighboring Crossover

Disguise Adversarial Networks for Click-through Rate Prediction

عنوان ژورنال:

اشتراک گذاری